vision algorithm
SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction
Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for general-purpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce $\texttt{SEVA}$, a new benchmark dataset containing approximately 90K human-generated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task. We found that vision algorithms that better predicted human sketch recognition performance also better approximated human uncertainty about sketch meaning, but there remains a sizable gap between model and human response patterns. To explore the potential of models that emulate human visual abstraction in generative tasks, we conducted further evaluations of a recently developed sketch generation algorithm (Vinker et al., 2022) capable of generating sketches that vary in sparsity. We hope that public release of this dataset and evaluation protocol will catalyze progress towards algorithms with enhanced capacities for human-like visual abstraction.
SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction
Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for general-purpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce \texttt{SEVA}, a new benchmark dataset containing approximately 90K human-generated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task.
A Computer Vision Approach for Autonomous Cars to Drive Safe at Construction Zone
Ahammed, Abu Shad, Hossain, Md Shahi Amran, Obermaisser, Roman
To build a smarter and safer city, a secure, efficient, and sustainable transportation system is a key requirement. The autonomous driving system (ADS) plays an important role in the development of smart transportation and is considered one of the major challenges facing the automotive sector in recent decades. A car equipped with an autonomous driving system (ADS) comes with various cutting-edge functionalities such as adaptive cruise control, collision alerts, automated parking, and more. A primary area of research within ADAS involves identifying road obstacles in construction zones regardless of the driving environment. This paper presents an innovative and highly accurate road obstacle detection model utilizing computer vision technology that can be activated in construction zones and functions under diverse drift conditions, ultimately contributing to build a safer road transportation system. The model developed with the YOLO framework achieved a mean average precision exceeding 94\% and demonstrated an inference time of 1.6 milliseconds on the validation dataset, underscoring the robustness of the methodology applied to mitigate hazards and risks for autonomous vehicles.
- Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Siegen (0.05)
- Asia > Middle East > Yemen > Amran Governorate > Amran (0.04)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
Exposing the Unseen: Exposure Time Emulation for Offline Benchmarking of Vision Algorithms
Gamache, Olivier, Fortin, Jean-Michel, Boxan, Matěj, Pomerleau, François, Giguère, Philippe
Visual Odometry (VO) is one of the fundamental tasks in computer vision for robotics. However, its performance is deeply affected by High Dynamic Range (HDR) scenes, omnipresent outdoor. While new Automatic-Exposure (AE) approaches to mitigate this have appeared, their comparison in a reproducible manner is problematic. This stems from the fact that the behavior of AE depends on the environment, and it affects the image acquisition process. Consequently, AE has traditionally only been benchmarked in an online manner, making the experiments non-reproducible. To solve this, we propose a new methodology based on an emulator that can generate images at any exposure time. It leverages BorealHDR, a unique multi-exposure stereo dataset collected over 8.4 km, on 50 trajectories with challenging illumination conditions. Moreover, it contains pose ground truth for each image and a global 3D map, based on lidar data. We show that using these images acquired at different exposure times, we can emulate realistic images keeping a Root-Mean-Square Error (RMSE) below 1.78 % compared to ground truth images. To demonstrate the practicality of our approach for offline benchmarking, we compared three state-of-the-art AE algorithms on key elements of Visual Simultaneous Localization And Mapping (VSLAM) pipeline, against four baselines. Consequently, reproducible evaluation of AE is now possible, speeding up the development of future approaches. Our code and dataset are available online at this link: https://github.com/norlab-ulaval/BorealHDR
Computer Vision and UPcoming Years…
Computer Vision is an interdisciplinary field that focuses on enabling computers to interpret and understand visual data from the world around us. The field of computer vision has undergone significant advancements in recent years, especially with the rise of deep learning and the availability of large-scale datasets. The increasing accuracy of computer vision algorithms and the expanding use of computer vision applications across various industries have led to a promising future for computer vision. Another area where computer vision is expected to be increasingly utilized is in healthcare. Computer vision can aid in medical diagnosis by analyzing medical images such as X-rays, CT scans, and MRIs.
VP-SLAM: A Monocular Real-time Visual SLAM with Points, Lines and Vanishing Points
Georgis, Andreas, Mermigkas, Panagiotis, Maragos, Petros
Traditional monocular Visual Simultaneous Localization and Mapping (vSLAM) systems can be divided into three categories: those that use features, those that rely on the image itself, and hybrid models. In the case of feature-based methods, new research has evolved to incorporate more information from their environment using geometric primitives beyond points, such as lines and planes. This is because in many environments, which are man-made environments, characterized as Manhattan world, geometric primitives such as lines and planes occupy most of the space in the environment. The exploitation of these schemes can lead to the introduction of algorithms capable of optimizing the trajectory of a Visual SLAM system and also helping to construct an exuberant map. Thus, we present a real-time monocular Visual SLAM system that incorporates real-time methods for line and VP extraction, as well as two strategies that exploit vanishing points to estimate the robot's translation and improve its rotation.Particularly, we build on ORB-SLAM2, which is considered the current state-of-the-art solution in terms of both accuracy and efficiency, and extend its formulation to handle lines and VPs to create two strategies the first optimize the rotation and the second refine the translation part from the known rotation. First, we extract VPs using a real-time method and use them for a global rotation optimization strategy. Second, we present a translation estimation method that takes advantage of last-stage rotation optimization to model a linear system. Finally, we evaluate our system on the TUM RGB-D benchmark and demonstrate that the proposed system achieves state-of-the-art results and runs in real time, and its performance remains close to the original ORB-SLAM2 system
Strategic Radiology
The first in a series of three presentations from Ferrum Health on the use of AI to impact population health, "AI in Oncology" provided Strategic Radiology members a provocative vision of a powerful new role for radiology in the health care enterprise from Elie Balesh, MD, private practice diagnostic and interventional radiologist and medical director of Ferrum Health. "Most health systems and radiology groups in the US have at least considered using AI, but very few, probably less than 5 or 10%, have actually implemented any kind of nuts-and-bolts meaningful AI program," noted Balesh. "While the appetite to test drive AI, potentially even pay for AI, exists across a large majority of these players, very few have put together the business plans, financial models, or earmarked budgets to actually implement AI technology and capitalize on its value." More than 1,200 companies are knocking on health system doors, reaching out to leadership and other stakeholders, trying get their discrete, short-term solutions purchased, installed, and deployed. "Very few AI vendors provide end-to-end, soup-to-nuts comprehensive clinical and operational workflow solutions, which is what is needed at the enterprise healthcare level," he explained.
- North America > United States > California (0.15)
- Asia (0.05)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Optical illusions could help us build a new generation of AI
You look at an image of a black circle on a grid of circular dots. It resembles a hole burned into a piece of white mesh material, although it's actually a flat, stationary image on a screen or piece of paper. But your brain doesn't comprehend it like that. Responding to the verisimilitude of the effect, the body starts to unconsciously react: the eye's pupils dilate to let more light in, just as they would adjust if you were about to be plunged into darkness to ensure the best possible vision. The effect in question was created by Akiyoshi Kitaoka, a psychologist at Ritsumeikan University in Kobe, Japan.
- Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.25)
- North America > United States > Oklahoma > Beaver County (0.05)
- Europe > Norway > Eastern Norway > Oslo (0.05)
What is artificial intelligence?
We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. The words "artificial intelligence" (AI) have been used to describe the workings of computers for decades, but the precise meaning shifted with time. Today, AI describes efforts to teach computers to imitate a human's ability to solve problems and make connections based on insight, understanding and intuition. Artificial intelligence usually encompasses the growing body of work in technology on the cutting edge that aims to train the technology to accurately imitate or -- in some cases -- exceed the capabilities of humans. Older algorithms, when they grow commonplace, tend to be pushed out of the tent.
Hyperspectral Imaging Enables Improved Machine Vision, Discussed by IDTechEx
Machine vision is increasingly important for many applications, such as object classification. However, relying on conventional RGB imaging is sometimes insufficient – the input images are just too similar, regardless of algorithmic sophistication. Hyperspectral imaging adds the extra dimension of wavelength to conventional images, providing a much richer data set. Rather than expressing an image using red, green, and blue (RGB) values at each pixel location, hyperspectral cameras instead record a complete spectrum at each point to create a 3D data set, sometimes referred to as a hyperspectral data cube. The additional spectral dimension facilitates supervised learning algorithms that can characterize visually indistinguishable objects – capabilities that are highly desirable across multiple application sectors.